Efficient and effective OCR engine training

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient OCR Training Data Generation with Aletheia*

We present how the ground-truthing tool Aletheia can be used to efficiently create training data for an open-source text recognition engine. The labelling process is sped up considerably through a top-down approach. Text content is thereby entered on region level. The characters are then propagated automatically to glyph objects. In addition, segmentation is simplified by several semi-automated...

متن کامل

Towards an Efficient and Effective Search Engine

Building an efficient and effective search engine requires both science and engineering. In this paper, we discuss the ATIRE search engine developed in our research lab, and both the engineering decisions and research questions that have motivated building ATIRE.

متن کامل

Hairetes: A Search Engine for OCR Documents

In this paper, we report on the architecture and preliminary implementation of our search engine, Hairetes. This engine is based on an extended concept of Retrieval by General Logical Imaging (RbGLI). In this extension, word similarity measures are computed by EMIM and Bayes’ theorem.

متن کامل

OCR with No Shape Training

We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a “clump” metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned to each cluster by maximizing matches with a lexicon of English words. We found that for 2/3 of t...

متن کامل

Distributed Classifier Training for Large Scale OCR

OCRopus (www.ocropus.org) is a new open source OCR system targeted at large books scanning and digital library applications, sponsored by Google for use in the Google Book system. Development started in 2007, with a beta release planned for April 2008. It is based on an earlier handwriting recognition system for U.S. Census forms . OCRopus currently contains two character recognizers (experimen...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Document Analysis and Recognition (IJDAR)

سال: 2019

ISSN: 1433-2833,1433-2825

DOI: 10.1007/s10032-019-00347-8